i’m trying to copy from global memory to host memory some structs connected one by one by a pointer (in portuguese we call “listas ligadas”, something like linked lists in english, i believe).
This is my code:
cudaMemcpy(X,dx,sizeof(resultados),cudaMemcpyDeviceToHost);
n = dx->prox;
m = X;
while(n != NULL){
cudaMemcpy(m->prox,n,sizeof(resultados),cudaMemcpyDeviceToHost);
n = n->prox;
m = m->prox;
}
Where dx,X,n and m are pointers to structs like this:
typedef struct resultados{
int tiro;
int raio;
float x;
float z;
float t;
struct resultados *prox;
}resultados;
I should be doing something wrong because, when i try to do this copy i get a “segmentation fault” error. Can someone help?
You wouldn’t ever want to use linked lists on the device and calling cudaMemcpy thousands of times for tiny objects is the most ineffective way to copy stuff.
You should flatten the list to an array and use arrays for all intensive computation.
The reason your code doesn’t work is probably that ‘prox’ pointers point to wrong memory space (host vs device).
Are you sure the kernel is running successfully? You can check the error code on the kernel by looking at the return code from cudaThreadSynchronize() after you launch the kernel. (cudaThreadSynchronize() is not required for correct behavior, but it’s a handy way to wait for the kernel to finish and check for errors while you are debugging.)
That still isn’t telling you that the kernel ever ran successfully. Add a call to cudaGetLastError() directly after the kernel launch. something like this: