Could you please help me find out this problem ?
I make a CUDA program which calculates the Potential of electric charges for N particles. I keep the number of threads per block(BLOCK_SIZE) is 512 threads/block. Keep increasing the number of points. I see that with a certain of points (for 8K points), 512 threads/block gives the wrong results, although 256 threads work fine until 64K points.
Can you help me understand that situation ?
This is the scheme of my program:
cudaMalloc((void **) &point,numberOfPointssizeof(float4));
cudaMalloc((void **) &result,numberOfPointssizeof(float2));
Ho Xung Lenh