How to copy a linked list

Thorn_Striff · March 28, 2010, 6:56am

Hello,

i’m trying to copy from global memory to host memory some structs connected one by one by a pointer (in portuguese we call “listas ligadas”, something like linked lists in english, i believe).

This is my code:

cudaMemcpy(X,dx,sizeof(resultados),cudaMemcpyDeviceToHost);

  n = dx->prox;

  m = X;

  while(n != NULL){

	 cudaMemcpy(m->prox,n,sizeof(resultados),cudaMemcpyDeviceToHost);

	 n = n->prox;

	 m = m->prox;

  }

Where dx,X,n and m are pointers to structs like this:

typedef struct resultados{

  int tiro;

  int raio;

  float x;

  float z;

  float t;

  struct resultados *prox;

}resultados;

I should be doing something wrong because, when i try to do this copy i get a “segmentation fault” error. Can someone help?

Isnt there some way to copy all the struct array?

_Big_Mac · March 28, 2010, 9:35am

You wouldn’t ever want to use linked lists on the device and calling cudaMemcpy thousands of times for tiny objects is the most ineffective way to copy stuff.

You should flatten the list to an array and use arrays for all intensive computation.

The reason your code doesn’t work is probably that ‘prox’ pointers point to wrong memory space (host vs device).

Thorn_Striff · March 28, 2010, 7:04pm

Thanks for your help!

I changed my code and now iá¸¿ using a array to save all data calculated on kernel. But, now iá¸¿ with another problem.

I got a entire thread just to save data to this array like you can see:

//on host:

#define TAM_MAX 10000

...

 cudaMalloc((void**)&d_X,TAM_MAX*5*sizeof(float));

...

//on kernel:

...

  __device__ __shared__ float x,z,t,zo;

...

	  if(threadIdx.x == 4){

	  t = fx(x,z);

		if(z - z0 >= dt){

	  d_X[i*5+0] = blockIdx.x;

	  d_X[i*5+1] = blockIdx.y;

	  d_X[i*5+2] = x;

	  d_X[i*5+3] = z;

	  d_X[i*5+4] = t;

	  i++;

	  z0 = z;

	}

	  }

Where x,z and t lives in shared memory and d_X array lives in global memory.

But, when i try to copy data back to host memory, looks like nothing was changed on global memory.

cudaMemcpy(X,d_X,TAM_MAX*5*sizeof(float),cudaMemcpyDeviceToHost);

It always returns “0” to any position of the array.

Any idea about whats going on?

seibert · March 28, 2010, 9:38pm

Are you sure the kernel is running successfully? You can check the error code on the kernel by looking at the return code from cudaThreadSynchronize() after you launch the kernel. (cudaThreadSynchronize() is not required for correct behavior, but it’s a handy way to wait for the kernel to finish and check for errors while you are debugging.)

Thorn_Striff · March 29, 2010, 1:12am

How do i check cudaThreadSynchronize() return? I tried:

if (cudaThreadSynchronize() == cudaSuccess) printf("done\n");

And looks like “cudaSuccess” isnt what cudaThreadSynchronize() returns.

seibert · March 29, 2010, 2:03pm

Call cudaGetErrorString() on the return code to get a string you can print.

Thorn_Striff · March 30, 2010, 4:20am

Done.

k_iteracao<<<Pos0,dimBlock>>>(d_inx,d_inteta,p,d_p1,d_p2,Pos0,Raios,d_X);

  printf("%s\n",cudaGetErrorString(cudaThreadSynchronize()));

Got a “no error” return.

Thorn_Striff · March 30, 2010, 5:12am

I just compiled with cudaGetErrorString on cudaMemcpy and it returns “no error” too.
Looks like cudaMemcpy isnt copying back data to host, but why?

avidday · March 30, 2010, 5:12am

That still isn’t telling you that the kernel ever ran successfully. Add a call to cudaGetLastError() directly after the kernel launch. something like this:

k_iteracao<<<Pos0,dimBlock>>>(d_inx,d_inteta,p,d_p1,d_p2,Pos0,Raios,d_X);

printf("%s\n",cudaGetErrorString(cudaGetLastError()));

printf("%s\n",cudaGetErrorString(cudaThreadSynchronize()));

That should detect both whether the launch was successful and whether the kernel ran to completions without generating a runtime error.

Thorn_Striff · March 30, 2010, 3:53pm

Thanks for help avidday.
Still returning “no error” for both.