Hi,
Could you please help me find out this problem ?
I make a CUDA program which calculates the Potential of electric charges for N particles. I keep the number of threads per block(BLOCK_SIZE) is 512 threads/block. Keep increasing the number of points. I see that with a certain of points (for 8K points), 512 threads/block gives the wrong results, although 256 threads work fine until 64K points.
Can you help me understand that situation ?
Thanks
This is the scheme of my program:
#define BLOCK_SIZE=512;
dim3 dimBlock(BLOCK_SIZE,1);
dim3 dimGrid(numberOfPoints/BLOCK_SIZE,1);
float4 point;
float2 result;
cudaMalloc((void **) &point,numberOfPointssizeof(float4));
cudaMalloc((void **) &result,numberOfPointssizeof(float2));
…
calculatePotential<<<dimGrid,dimBlock>>>(point,result);
Ho Xung Lenh