CUDA particle simulation, not all particles interact?

Hi, I’ve created a N-body charged particle simulation that uses CUDA for the calculations and SDL for the rendering, but the problem is I can’t seem to get all the particles to interact with each other. Only certain groups of particles seem to actually apply force on each other, like I can see two particles basically touching withouting having any sort of effect, while other particles display the correct behaviour. I know my actually physics algorithm works, because I got it to work pn the CPU, its just the parallel GPU computation that doesn’t seem to work…

I run the function with N blocks and N threads, where i is the block number and j is the thread number, then I apply the forces of particle array[j] on particle array[i] where i != j.

I won’t supply all my code as it’s all bundled in one file right now with a whole load of irelivent SDL code, so I will just supply the main loop and the physics() function

__global__ void physics( particle *arr )
{

	double force;
	double forceT;
	double temp;
	double dx, dy;
	double r;

	unsigned int i = blockIdx.x;
	unsigned int j = threadIdx.x;

	if( i != j )
	{
		temp = ( KE * arr[i].charge * arr[j].charge );
		dx = - arr[i].x + arr[j].x;
		dy = - arr[i].y + arr[j].y;

		r = sqrt( dx * dx + dy * dy );

		force = temp / ( r * r );
		forceT = atan2( dy , dx );

		arr[i].xv += -force * cos( forceT ) / arr[i].mass * DT;
		arr[i].yv += -force * sin( forceT ) / arr[i].mass * DT;

		arr[i].x += arr[i].xv*DT;
		arr[i].y += arr[i].yv*DT;
	}
}
while( running && mainEvent->type != SDL_QUIT )
	{
		//Polls for new events (?)
		SDL_PollEvent( mainEvent );

		//Nuke the window
		SDL_RenderClear( renderer );

		cu_MEMCOPY_D( particle, pArr, N );
		physics<<<N,N>>>( d_pArr );
		cu_MEMCOPY_H( particle, pArr, N );
		renderParticles( h_pArr, N, renderer );

		//Render new frame
		SDL_RenderPresent( renderer );
	}

Be warned I made a few macros for allocation and copying between the host and the device, so please dont hate me for using macros.

Is there any light you can shed possible on my problem?

If you run your code with cuda-memcheck are there any errors reported?

How large is N?

I have no idea what that is. I literally made this simulation immediately after the introductory C++ CUDA slideshow…

I enabled CUDA memory checker in Nsight and clicked CUDA debug, everything works fine.

N is 1024, but the same behaviour is observed for smaller Ns as well.

I kinda fixed it running NN blocks with 1 thread. Though, I wonder if NN threads on one block would be more efficient…

You have the possibility for multiple threads (or blocks) to be updating a given particle’s position or velocity at the same time.

The results of this would be unpredictable.

A simple approach to sort this out would be to use atomics. There are probably other ways to restructure the code to avoid atomic usage, but a problem with 1024 particles is small enough and Kepler and newer global atomics are fast enough that this might be sufficient for your needs.

Thank you very much for replying, I’ll check it out.