Passing two nested structures including pointer of pointers at the child structure?

I have two structures need to be transferred into device memory from host;

struct collapsed {
	char **seq;
	int num;
};

// set of collapsed examples
struct data {
	collapsed *x;
	int num;
	int numblocks;
	int *blocksizes;
	float *regmult;
	float *learnmult;
};

I have one statically defined ‘data X;’ to be passed and X includes collapsed array pointer. I 've been working for a week and now out of mind since I did not do that although I looked all the web and forums. Is there any senior developer that is able to help me with a little code snippet or an explanation.

im not a senior developer but I see two ways of doing this (if I understand correctly)

  1. just add the whole “collapsed” structure inside the “data” structure and then transfer that structure to the device. (change line 8 to “collapsed x;” or “collapsed x[100];”) This might not be what you want though.

  2. if you need to keep “collapsed *x;” as a pointer then…
    a) get the device pointers to the two static structs. (you might just have one as static but I did both)
    b) on the cpu/host fill in the data’s x address with the address from step a
    c) copy the data structure over to the device. (or overwrite the existing one)

struct collapsed {
	char **seq;
	int num;
};

// set of collapsed examples
struct data {
	collapsed *x;
	int num;
	int numblocks;
	int *blocksizes;
	float *regmult;
	float *learnmult;
};

__device__ data devData;
__device__ collapsed devCollapsed;

void FillStructs(){
	data hostData;
	collapsed hostCollapsed;

	// fill in hostData and hostCollapsed here.

	//get the pointers on the device
	data *d_data;
	collapsed *d_collapsed;
	cudaGetSymbolAddress( (void**)d_data , devData);
	cudaGetSymbolAddress( (void**)d_collapsed, devCollapsed);

	hostData.x = d_collapsed;

	cudaMemcpy(d_data, &hostData, sizeof(data), cudaMemcpyHostToDevice);
	cudaMemcpy(d_collapsed, &hostCollapsed, sizeof(collapsed), cudaMemcpyHostToDevice);
}

Sorry I did not say but collapsed * is a array pointer inside X. Is this still working? Also another char pointer is in collapsed. I guess in your way, I cannot pass char pointers to device.

If unified memory is not available then cpu and gpu pointers are incompatible with each other.
(Each have their own memory address space, which is like an integer/index in an array by with different values because they were allocated in a different order or using a different memory manager and thus their values/offsets will not be the same and mismatches will happen).

What you could do is allocate the structure twice, once for the cpu, once for the gpu, this can be done on side of the cpu for both cpu and gpu via apis. (This allows you to “track” the pointers for the structures on both sides and know how to translate them).

By keeping track of two structures which mirror each other exactly one can then pass the gpu pointer to the gpu instead of the cpu pointer.

Also the cpu pointer can be used to copy from cpu pointer to gpu pointer or vice versa with the appropriate apis.

So for example for your structure it could like something like this:

struct collapsed {
	char **seq_cpu;
	char **seq_gpu;
	int num;
};

// set of collapsed examples
struct data {
	collapsed_cpu *x;
	collapsed_gpu *x;
	int num;
	int numblocks;
	int *blocksizes_cpu;
	int *blocksizes_gpu;
	float *regmult_cpu;
	float *regmult_gpu;
	float *learnmult_cpu;
	float *learnmult_gpu;
};

}

This will then later allow the copy functions to copy data from cpu to gpu and vice versa from the correct places.

I see what you mean but I do not figure out how to fill those structure. In addition, it is not my choose to change the content of structures since it is given by any other module implemented by someone else. Don’t we have any other direct method to pass the structure completely yo device by using memcpy tricks with device pointers.

//data X is given by full of its content.
// I try to keep X's single values and X.x pointer array at different device pointers defined below 

....

       data *X_dev;
       data *x_dev;

// Junk variables
	char				**seq_tmp;
	char				**seq_tmp2;
	int 				*blocksizes_tmp;
	float				*regmult_tmp;
	float				*learnmult_tmp;
	int					*num_thread_exe; //for counting relative iteration number
....

        cudaMalloc((data **) & X_dev, sizeof(data));
	cudaMalloc((int**)&blocksizes_tmp, sizeof(int)*N);
	cudaMalloc((float**)&regmult_tmp, sizeof(float)*N);
	cudaMalloc((float**)&learnmult_tmp, sizeof(float)*N);
	cudaMemcpy(blocksizes_tmp, X.blocksizes, sizeof(int)*N, cudaMemcpyHostToDevice);
	cudaMemcpy(regmult_tmp, X.regmult, sizeof(float)*N, cudaMemcpyHostToDevice);
	cudaMemcpy(learnmult_tmp, X.learnmult, sizeof(float)*N, cudaMemcpyHostToDevice);
	free(X.blocksizes);
	free(X.regmult);
	free(X.learnmult);
	X.blocksizes = blocksizes_tmp;
	X.regmult = regmult_tmp;
	X.learnmult = learnmult_tmp;
	cudaMemcpy(X_dev, &X, sizeof(data), cudaMemcpyHostToDevice);
	gpuErrchk(cudaMalloc((collapsed **) &x_dev, X.num*sizeof(collapsed)));
	for(int i = 0; i<N; i++){
		cudaMalloc((char***)&seq_tmp,sizeof(char*)*X.x[i].num);
		seq_tmp2 = (char**)malloc(sizeof(char*)*X.x[i].num);
		char** seq_tmp3 = (char**)malloc(sizeof(char*)*X.x[i].num);
		for(int j = 0; j<X.x[i].num; j++){
			cudaMalloc((void **)&(seq_tmp2[j]), sizeof(X.x[i].seq[j]));
			cudaMemcpy(seq_tmp2[j], X.x[i].seq[j],sizeof(X.x[i].seq[j]), cudaMemcpyHostToDevice);
			free(X.x[i].seq[j]);
		}

		gpuErrchk(cudaMemcpy(seq_tmp , seq_tmp2, sizeof(char*)*X.x[i].num, cudaMemcpyHostToDevice));
		X.x[i].seq = seq_tmp;
	}
	cudaMemcpy(x_dev, X.x, sizeof(collapsed)*N, cudaMemcpyHostToDevice);

I have done such thing by looking to http://stackoverflow.com/questions/16695450/cuda-class-with-multidimensional-pointers/16697507#16697507 . It is not currently working. Do you think this is logical? I cannot solve the problem yet and I am going crazy bit by bit :)

I assume your problem is to copy an array of pointers to the gpu memory.

Copieing a “host” pointer towards the gpu makes little sense.

Perhaps the array of pointers, is an array of strings, whatever the case may be what you will have to do is the following:

CPU Side:

  1. Allocate the array of pointers.
  2. Allocate memory/strings for each pointer in the array.

For the GPU Side, do the same here:

  1. Allocate the array of pointers… by allocating remote memory, to create remote pointers.
  2. Allocate remote memory for each remote pointer.

What you can then do is copy the structure/strings one by one towards the gpu.

Your code isn’t entirely clear to me… I also use the driver api and not the runtime api… so I am not sure exactly how your code works that you pasted.

Also what further complicates the matter is how to pass the pointers towards the api.

If I were you I would start with a much simpler example with just 2 pointers because that’s basically what you are trying to copy towards the gpu.

A pointer to a pointer (to probably some memory <- this is probably what you after anyway) (try inspecting the values of the pointers in the debuggers to compare on cpu and gpu side if copieing went ok… and such…)

Try writing some code for that first instead of a much more complicated example.