CUDA gives wrong result for large number of points/block


Could you please help me find out this problem ?

I make a CUDA program which calculates the Potential of electric charges for N particles. I keep the number of threads per block(BLOCK_SIZE) is 512 threads/block. Keep increasing the number of points. I see that with a certain of points (for 8K points), 512 threads/block gives the wrong results, although 256 threads work fine until 64K points.

Can you help me understand that situation ?


This is the scheme of my program:
#define BLOCK_SIZE=512;
dim3 dimBlock(BLOCK_SIZE,1);
dim3 dimGrid(numberOfPoints/BLOCK_SIZE,1);
float4 point;
float2 result;
cudaMalloc((void **) &point,numberOfPoints
cudaMalloc((void **) &result,numberOfPoints


Ho Xung Lenh


I got (i guess) the same problem with the interactions of N-bodies. With more than 64K Particles my programm crashes, independent of the block- or grid-sizes. I hope someone can help…


I received very strange behavior on CUDA : When I change the number of points (for ex. from 4K points to 8K points):

  • If I compile with 256 threads/block, run and print out the results, and then compile and run again with 512 threads/block: 512 threads/block works fine
  • If I compile with 512 threads/block first, the result is wrong.

Linux version: 2.6.18-92.1.17.el5
Device: Quadro NVS 290

Newest SDK, toolkit, compiler option: nvcc -o cuda
CPU model: Intel® Xeon® CPU X5472 @ 3.00GHz

If someone has any ideas, please help me