How to copy a linked list

Hello,

i’m trying to copy from global memory to host memory some structs connected one by one by a pointer (in portuguese we call “listas ligadas”, something like linked lists in english, i believe).

This is my code:

cudaMemcpy(X,dx,sizeof(resultados),cudaMemcpyDeviceToHost);

  n = dx->prox;

  m = X;

  while(n != NULL){

	 cudaMemcpy(m->prox,n,sizeof(resultados),cudaMemcpyDeviceToHost);

	 n = n->prox;

	 m = m->prox;

  }

Where dx,X,n and m are pointers to structs like this:

typedef struct resultados{

  int tiro;

  int raio;

  float x;

  float z;

  float t;

  struct resultados *prox;

}resultados;

I should be doing something wrong because, when i try to do this copy i get a “segmentation fault” error. Can someone help?

Isnt there some way to copy all the struct array?

You wouldn’t ever want to use linked lists on the device and calling cudaMemcpy thousands of times for tiny objects is the most ineffective way to copy stuff.

You should flatten the list to an array and use arrays for all intensive computation.

The reason your code doesn’t work is probably that ‘prox’ pointers point to wrong memory space (host vs device).

Thanks for your help!

I changed my code and now iḿ using a array to save all data calculated on kernel. But, now iḿ with another problem.

I got a entire thread just to save data to this array like you can see:

//on host:

#define TAM_MAX 10000

...

 cudaMalloc((void**)&d_X,TAM_MAX*5*sizeof(float));

...

//on kernel:

...

  __device__ __shared__ float x,z,t,zo;

...

	  if(threadIdx.x == 4){

	  t = fx(x,z);

		if(z - z0 >= dt){

	  d_X[i*5+0] = blockIdx.x;

	  d_X[i*5+1] = blockIdx.y;

	  d_X[i*5+2] = x;

	  d_X[i*5+3] = z;

	  d_X[i*5+4] = t;

	  i++;

	  z0 = z;

	}

	  }

Where x,z and t lives in shared memory and d_X array lives in global memory.

But, when i try to copy data back to host memory, looks like nothing was changed on global memory.

cudaMemcpy(X,d_X,TAM_MAX*5*sizeof(float),cudaMemcpyDeviceToHost);

It always returns “0” to any position of the array.

Any idea about whats going on?

Are you sure the kernel is running successfully? You can check the error code on the kernel by looking at the return code from cudaThreadSynchronize() after you launch the kernel. (cudaThreadSynchronize() is not required for correct behavior, but it’s a handy way to wait for the kernel to finish and check for errors while you are debugging.)

How do i check cudaThreadSynchronize() return? I tried:

if (cudaThreadSynchronize() == cudaSuccess) printf("done\n");

And looks like “cudaSuccess” isnt what cudaThreadSynchronize() returns.

Call cudaGetErrorString() on the return code to get a string you can print.

Done.

k_iteracao<<<Pos0,dimBlock>>>(d_inx,d_inteta,p,d_p1,d_p2,Pos0,Raios,d_X);

  printf("%s\n",cudaGetErrorString(cudaThreadSynchronize()));

Got a “no error” return.

I just compiled with cudaGetErrorString on cudaMemcpy and it returns “no error” too.
Looks like cudaMemcpy isnt copying back data to host, but why?

That still isn’t telling you that the kernel ever ran successfully. Add a call to cudaGetLastError() directly after the kernel launch. something like this:

k_iteracao<<<Pos0,dimBlock>>>(d_inx,d_inteta,p,d_p1,d_p2,Pos0,Raios,d_X);

printf("%s\n",cudaGetErrorString(cudaGetLastError()));

printf("%s\n",cudaGetErrorString(cudaThreadSynchronize()));

That should detect both whether the launch was successful and whether the kernel ran to completions without generating a runtime error.

Thanks for help avidday.
Still returning “no error” for both.