miserable struct array and cudaMemCpy

I fight with structs in cuda for couple days, simple float array passing works fine. The problem is with simple structs…

Take look at the snippets

Struct edge looks like this - for now (at the begining it was bigger, but problems made me to do a proxy)

struct edge {

        int value;

#include <stdio.h>

#include "structs.h"

#include <cuda.h>

__global__ void findAllEdges(struct edge** ed) {

        int idx = blockIdx.x * blockDim.x + threadIdx.x;

        (*ed)[idx].value = 23;  //some value, not important yet


void findEdges(struct elem** elems, struct edge** edges, struct param* params) {

        int i;

        int edgesAmount = 4*( params->amountOfElems );  //40 atm

(*edges) = (struct edge*) malloc (sizeof(struct edge)*edgesAmount);

        for(i=0;i<edgesAmount;i++) {

                (*edges)[i].value = 11; //value to differ from wanted 23


struct edge* d_edges;

size_t size = edgesAmount*sizeof(struct edge);

cudaMalloc((void **) &d_edges, size);

        cudaMemcpy(d_edges, edges, size, cudaMemcpyHostToDevice);

int NT = params->amountOfElems*4;

        int n_blocks = NT/5 + (NT%5 == 0 ? 0:1);

findAllEdges <<< n_blocks, 5  >>> (&d_edges);

int cudaStatus = cudaMemcpy(edges, d_edges, size, cudaMemcpyDeviceToHost);

if(cudaStatus == cudaErrorLaunchFailure)



                        printf("edge[%d] val %d \n",i,(*edges)[i].value);



the last cudaMemCpy returns cudaErrorLaunchFailure … so it prints WTF…

when i print the array of edges it prints value 11 instead of 23 (due to failed MemCpy i guess).

What am i doing wrong, i am about to go insane due to that simple issue…

You are mixing single and double pointers, and host and device pointers.


cudaMemcpy(d_edges, edges, size, cudaMemcpyHostToDevice);

you are trying to copy pointers into an array of struct edges, which can’t work.

And here

findAllEdges <<< n_blocks, 5  >>> (&d_edges);

you are passing a (host pointer to device pointer) to a (device pointer to device pointer), which can’t work either.

P.S.: What are you trying to achieve by the use of double pointers? As you see above, these quickly become difficult with CUDA, and are best avoided.

thanks, redoing it all to single pointers did its job.

I use double pointers, because its the only way i know to return allocated pointers and use them further without return statement.
I did some in mpi or pvm and that worked fine :)
I changed that function for now to just return pointer to allocated and “filled” struct edge… But what do you suggest when i want to operate on e.g 3 struct arrays at the same time, changing something in them, and then use it in other functions? That’s C question i know.
When i do it on single pointers (no-return statement) it gives me ‘memory violation blablabla’.

Ok, you can use pointers to pointers on the host side (as cudaMalloc() does for exactly the reason you mentioned). But by the time you copy data to the device, these should better be flattened to single pointers.

Otherwise you need to carefully think about each level of indirection, in which kind of memory the pointer lives and into which address space it points. However there usually is little gain in setting up complex data structures on the device, as the GPU prefers to operate of flat arrays for optimal speed.