Dereferrencing problem can't dereferrence a pointer correctly

Hi!

I’ve got a structure with pointer of the same type inside. When I try to assign value of some member from inside pointer to member of base structure CUDA cause “Unknown error”. Sorry in advance for terrible explanation so I hope the code will clarify all I tried explain:

struct Particle {

	...

	Neighbour* neighbours;

        ...

}
struct Neighbour {

	Particle*	particle;

};
__device__ void cudaInitDensity(Particle &p){

  Neighbour* neighbours = p.neighbours;

  for(int k = 0; k < p.num_of_neighbours; k++){

	Particle* c = neighbours[k].particle;

        p.density = p.density + c->mass; //error is here

  } 

}

If I do “+ c->mass * 0” it works. If I pass “p” by value it works. I must confess that I’m a beginner at CUDA and maybe don’t completely undestand something.

It almost certainly implies that the pointer is invalid. The compiler is smart enough to know that

p.density + c->mass*0 = p.density + 0

and remove the line as dead code.

As a first guess, your neighbors. array is invalid. c is NULL or an invalid pointer. The “c->mass *0” version might work because the compiler optimizes out the math anyway since it’s smart enough to recognize that a zero multiply returns 0, so it just elides the code.

This is just a guess, there’s no definitive answer from just what you posted.
Try some printf() tests of the value of the c pointer to see if it looks reasonable.

Before I pass data to GPU I check neighbours and it’s correct.

Maximum count of neighbours is 128 so I do

size_t size = 128 * N * sizeof(Particle);

cudaMalloc((void **) &device, size);

For me everything seems to be right.

If that is all you are doing, then that is your problem. When are you allocating memory for or assigning values to the pointers contained inside that allocation?

All the values were assigned to the pointers on CPU before being passed to GPU. I don’t make any special cuda malloc for pointers inside.

Right so there is the problem then. You can’t use host pointers inside device code.

Am I on the right way when I do smth like that to get pointers and data they refer to correct on the GPU?

cudaMalloc((void **) &device, size);

for (int i = 0; i < N; i++) {

  Neighbour * dn;

  size_t size_n = 128*sizeof(Neighbour);

  cudaMalloc((void **) &dn, size_n);

  cudaMemcpy(dn, ps->particles[i].neighbours, size_n, cudaMemcpyHostToDevice);

  device[i].neighbours = dn;

}

That code doesn’t do what you think it does. It is setting some random values in host memory, not device memory.

What is a workaround in this case except copying all the neighbours and passing them separately?

Here’s what you need to do:

Particle* dParticles;

Neighbour* dNeighbours;

Particle* pParticles_temp;

Neighbour* pNeighbours_temp;

cudaMalloc((void **) &dParticles, N*sizeof(Particle));

cudaMalloc((void **) &dNeighbours, N*128*sizeof(Neighbour));

pParticles_temp = new Particle[N];

pNeighbours_temp = new Neighbour[N*128];

memcpy(pParticles_temp, pParticles, N*sizeof(Particle));

for (int i = 0; i < N; i++)

{

  pParticles_temp[i].neighbours = dNeighbours + i*128;

  for(int j = 0; j < pParticles_temp[i].num_of_neighbours; j++)

  {

    pNeighbours_temp[i*128+j].particle = dParticles + (pParticles[i].neighbours.particle[j] - pParticles);

  }

}

cudaMemcpy(dParticles, pParticles_temp, N*sizeof(Particle), cudaMemcpyHostToDevice);

cudaMemcpy(dNeighbours, pNeighbours_temp, N*128*sizeof(Neighbour), cudaMemcpyHostToDevice);

delete[] pParticles_temp;

delete[] pNeighbours_temp;

But, if you’re not feeling any memory pressures and you’re OK with allocating a 128-element array for each element, your code will be much simplified if you redefine Particle like this

struct Particle {

        ...

        int neighbours[128];

        ...

}

...

__device__ void cudaInitDensity(Particle* pArray, Particle &p){

  for(int k = 0; k < p.num_of_neighbours; k++){

        Particle* c = pArray+p.neighbours[k];

        p.density = p.density + c->mass; 

  } 

}

and this way you just do one cudaMemcpy and you don’t have to worry about all the pointers.

Here is the code I launch:

int N = ps->getPartNum();

		FILE *ofp;

		ofp = fopen("c:\exec.txt", "w");

		fprintf(ofp, "neighb 1966 10 = %f \n", ps->particles[1966].neighbours[10].particle->density);

		Particle* dParticles;

		Neighbour* dNeighbours;

		Particle* pParticles_temp;

		Neighbour* pNeighbours_temp;

		cudaMalloc((void **) &dParticles, N*sizeof(Particle));

		cudaMalloc((void **) &dNeighbours, N*128*sizeof(Neighbour));

		pParticles_temp = new Particle[N];

		pNeighbours_temp = new Neighbour[N*128];

		memcpy(pParticles_temp, ps->particles, N*sizeof(Particle));

		for (int i = 0; i < N; i++)

		{

		  pParticles_temp[i].neighbours = dNeighbours + i*128;

		  for(int j = 0; j < pParticles_temp[i].num_of_neighbours; j++)

		  {

			pNeighbours_temp[i*128+j].particle = 

				dParticles +

					(ps->particles[i].neighbours[j].particle - ps->particles);

		  }

		}

		cudaMemcpy(dParticles, pParticles_temp, N*sizeof(Particle), cudaMemcpyHostToDevice);

		cudaMemcpy(dNeighbours, pNeighbours_temp, N*128*sizeof(Neighbour), cudaMemcpyHostToDevice);

		delete[] pParticles_temp;

		delete[] pNeighbours_temp;

		cudaPrintfInit();

		mainCalcCuda <<< N, 1 >>> (dParticles, framenum);

		cudaThreadSynchronize();

		cudaMemcpy(ps->particles, dParticles, N*sizeof(Particle), cudaMemcpyDeviceToHost);

		//neighbours from device to host

		for (int i = 0; i < N; i++)

		{

		  ps->particles[i].neighbours = new Neighbour[128];

		  cudaMemcpy(ps->particles[i].neighbours, dNeighbours + i*128, 128*sizeof(Neighbour), cudaMemcpyDeviceToHost);		  

		}	

		printf("%s \n",cudaGetErrorString(cudaGetLastError()));

		//fprintf(ofp, "after neighb 1966 10 = %f \n", ps->particles[1966].neighbours[10].particle->density);

		fclose(ofp); 

		cudaPrintfDisplay(stdout, true);

		cudaPrintfEnd();

Before passing data to gpu I check

ps->particles[1966].neighbours[10].particle->density

and everything seems ok (zero).

but after receiving the data from gpu smth wrong happens:

ps->particles[1966].neighbours[10].particle->density

becomes 1.#QNAN0 and error occures.

Here is the code where I find 1.#qnan first time (inside mainCalcCuda-kernel)

Neighbour* neighbours = p.neighbours;

  float summ = 0;

  for(int k = 0; k < p.num_of_neighbours; k++){

	Particle* c = neighbours[k].particle;

	cuPrintf("c.density %d %d = %f;\n", idx, k, c->density); //there it happens

	float w = cudaW(cudaNorm(c->prevPosition - p.prevPosition), c);	

	summ += c->mass * w;	

  }

All others neighbours are ok. 1970 particles are in the system.

Am I doing passing/receiving data in the right way? How do you think what’s wrong?